Method and apparatus for determining digital media visibility

ABSTRACT

A method of determining whether media displayed on a web page in a web browser is visible comprising: determining whether the web page comprises at least one frame; and if the web page comprises at least one frame, using information relating to the activity of the media player to determine whether the media is visible.

TECHNICAL FIELD

The present invention relates to methods and apparatus for determiningthe visibility of media on a web page shown within a web browser.

BACKGROUND OF THE INVENTION

As a result of the increase in the number of computing devices availableto users, the proportion of media viewed by the public through aninternet connection has increased. It is expected in the coming yearsthat this proportion will increase further so that a significantproportion of the media viewed by users will be viewed on electronicdevices such as laptops, netbooks, tablets and mobile phones throughinterfaces such as web browsers.

The introduction of Personal Video Recorders (PVRs) and Digital Set-TopBoxes in recent years has decreased the effectiveness of advertisementswithin such traditional areas as television “ad-breaks”. The placementof advertisements to capture the emerging mechanisms for viewing mediahas therefore been a subject of interest within the advertising industryin recent years.

In particular, digital video advertising is increasingly used. Anadvertiser or advertising agency will create media, typically in theform of an advertisement, i.e. a digital video. The advertisement isdistributed by a publisher who delivers the digital video content topositions within web pages to be viewed by a user. It is common for anadvertiser to pay the publisher per instance of the digital videocontent delivered, in other terms per “impression”. However, in orderfor the advertiser to be confident that they are receiving value formoney it is important that the digital video content is provided to theviewer in a way that enables the viewer to view media, for example inthe form of digital video content.

In order to ascertain the effectiveness, and therefore the value, of anadvertising campaign directed to media obtained through an internetconnection, suitable metrics are required. Such metrics may include anindication of whether media, such as an advertisement, appears withinthe viewable window of a web browser when the web page is initiallyloaded. For example, the digital video content may appear in the activewindow of a web browser, but may be located in a part of the page thatis below the bottom edge of the browser window when the web page isfirst loaded. If this is the case, the user must scroll down the windowin order to view the media. Media that appears below or to the side ofthe browser window such that it is not visible when the web page firstloads is known as “below-the-fold” media. Conversely, media that isvisible when the web page first loads is known as “above-the-fold”media.

It is known to identify whether media is “below-the-fold” by obtainingthe browser window height and width, calculating the co-ordinates of thelocation of the media on the web page (in pixels from the top left ofthe web page) and dividing the co-ordinates by the browser windowdimensions to calculate the number of times the user will need to scrolla complete browser window to arrive at the digital video content. Ifthis number is less than one, the digital video content is considered“above-the-fold” and therefore visible when the web page is firstloaded.

It is not currently possible to obtain metrics for a large census of aparticular advertising campaign due to the limitations with knownapproaches which utilise page analysis. Such techniques cannot be usedin website designs where the advertisements are positioned on web pagesby use of framing; for example, by using the <iframe> tag within HTML.Since the position of the media is related to the frame rather than theweb page within which the frame is located the known technique abovecannot be used.

SUMMARY

According to a first aspect of the present invention, there is provideda method of determining whether media displayed on a web page in a webbrowser is visible to a user, the method comprising:

determining whether the media is displayed in at least one frame of theweb page; and if the media is displayed in at least one frame,monitoring and analysing activity information from a user terminal todetermine whether the media is visible to the user.

Preferably, said activity information comprises information relating tothe activity of a media player.

Preferably, the activity information is input into a model; and whereinthe model provides an estimation of whether the media is visible basedupon the information relating to the activity of the media player.

Preferably, the numerical model comprises a probabilistic model.

Preferably, the numerical model comprises a regression analysis.

Preferably, the coefficients of the numerical model are determined usingtraining data.

Preferably, the activity information relates to the activity of themedia player and comprises a frame rate of the media player.

Preferably, the activity information relates to the activity of themedia player.

Preferably, the activity information comprises data on a sleep mode ofthe media player.

Preferably, determining whether the media player is in a sleep modecomprises determining whether the frame rate is below a predeterminedthreshold.

Preferably, the activity information comprises total time that the mediaplayer spent in sleep mode.

Preferably, the activity information comprises the percentage of thetotal monitored time spent in sleep mode.

Preferably, the activity information comprises the number of transitionsof the media player to sleep mode.

Preferably, the activity information comprises data on an active mode ofthe media player.

Preferably, determining whether the media player is in an active modecomprises determining whether the frame rate is above a predeterminedthreshold.

Preferably, the activity information comprises total time that the mediaplayer spent in active mode.

Preferably, the activity information comprises the number of transitionsof the media player to

Preferably, the activity information comprises the percentage of thetotal monitored time spent in active mode.

Preferably, the media is digital media. Preferably, the media is digitalvideo media. Preferably, the media is rich media. Preferably, the mediais an advertisement.

Preferably, if the web page does not comprise at least one frame, thedetermination of whether the media is visible uses page analysis.

Preferably, page analysis comprises:

determining the relative location of the media on the web page; anddetermining from the relative location of the media whether the media isvisible.

Preferably, page analysis comprises determining the dimensions of theactive window of the web browser.

Preferably, the activity information is stored with information relatingto the results of page analysis to form training data for the model,when page analysis is performed.

Preferably, the stored information is used to calibrate coefficients ofthe model.

According to a second aspect of the invention, there is provided amethod of calibrating a model for determining whether media displayed ona web page in a web browser is visible when the web page comprises atleast one frame;

the method comprising using information obtained from at least one mediadisplayed in at least one web page which does not comprise at least oneframe.

Preferably, the information obtained from the at least one mediadisplayed in at least one web page which does not comprise at least oneframe comprises information obtained by page analysis.

Preferably, page analysis comprises:

determining the relative location of the media on the web page; anddetermining from the relative location of the media whether the media isvisible.

Preferably, the page analysis comprises determining the dimensions ofthe active window of the web browser.

Preferably, the method comprises monitoring and analysing activityinformation from a user terminal.

Preferably, the activity information is input into the model; andwherein the model provides an estimation of whether the media is visiblebased upon the activity information.

Preferably, the numerical model comprises a probabilistic model.

Preferably, the numerical model comprises generalised linear regressionanalysis.

Preferably, the coefficients of the numerical model are determined usingtraining data.

Preferably, the coefficients are updated intermittently.

Preferably, the activity information comprises data on a frame rate ofthe media player.

Preferably, the activity information comprises data on a sleep mode of amedia player.

Preferably, the media player is in a sleep mode comprises determiningwhether the frame rate is below a predetermined threshold.

Preferably, the activity information comprises the number of transitionsof the media player to sleep mode.

Preferably, the activity information comprises the percentage of thetotal monitored time spent in sleep mode.

Preferably, the activity information comprises data on an active mode ofa media player.

Preferably, the media player is in an active mode comprises determiningwhether the frame rate is above a predetermined threshold.

Preferably, the activity information comprises total time that the mediaplayer spent in active mode.

Preferably, the activity information comprises the number of transitionsof the media player to active mode.

Preferably, the activity information comprises the number of transitionsof the media player to active mode.

Preferably, the activity information comprises the percentage of thetotal monitored time spent in active mode.

Preferably, the media is digital media.

Preferably, the media is digital video media.

Preferably, the media is rich media.

Preferably, the media is an advertisement.

According to a third aspect of the invention, there is provided acomputer readable medium bearing computer code which, when executed by aprocessor, causes the device to implement the method of either of thefirst and second aspects.

According to a fourth aspect of the invention, there is provided a dataprocessing apparatus configured to implement the method of either of thefirst and second aspects.

According to a fifth aspect of the invention, there is provided acomputer apparatus arranged to determine whether media displayed in awindow of a web browser on a remote user terminal is visible to a userof the terminal, the apparatus comprising:

an interface configured to receive activity data from code running on aremote user terminal; computer code operable, when executed, to causethe computer to input said activity data into a model configured toprovide an indication of whether the media displayed on the remotecomputer is visible to a user at the remote computer; anda calibration module arranged to receive training data to calibrate theactivity data used in the model.

Preferably, the activity data received from the code running on theremote user terminal comprises data selected from one or more of: framerate data; sleep mode data; mouse or other pointer movement data; clickor other selection data; page dwell time; resource availability data;another data capable of indicating media within a at least one frame isvisible to a user.

Preferably, the model is a numerical model.

Preferably, the numerical model is implemented using regressioncoefficients derived from the activity data.

Preferably, the regression coefficients have been calibrated basedtraining data from page analysis of instances without a frame.

Typically, “at least one frame of a web page” means a frame displayed bya media player. In the case of the media being video content, “at leastone frame of a web page” typically refers to the first or, optionally,another early frame in a sequence of frames played by the media playeron a user terminal during playing out of the video.

The capability to detect whether the media is visible within a web pageacross a large number and variety of terminals may be used in a numberof different applications, including fraud detection, auto-instantiatedplacements, reach estimation, and to give publishers greater strength ininstantiating media as a specific product.

Media may be considered as one of a number of different types includingrich media (non-video), such as images, and digital video contentincluding interactive video. Media may also encompass videos in the formof advertisements or in other forms.

An impression, or a viewable impression, is a term used to describe ametric to report whether and how many media or advertisements areactually viewed by users.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is diagrammatically illustrated by way of example, in theaccompanying drawings in which:

FIG. 1 shows a system overview of a network for web media distribution;

FIG. 2 shows a flow diagram of data acquisition from web media;

FIG. 3 shows a flow chart of determining visibility of media by pageanalysis;

FIG. 4 shows a flow chart of determining visibility of media by hybridanalysis;

FIG. 5 shows a timeline of playing a media through a web browser;

FIG. 6 shows a flow chart of system calibration; and

FIG. 7 shows a flow chart of performing linear regression;

DETAILED DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system overview of a network for distributing webmedia. As shown in FIG. 1, the network illustrates a user terminal 10comprising a web browser 20. The user terminal may take the form of anyelectronic device which is capable of running a web browser or otherinterface through which web-based media may be obtained. In someembodiments, the user terminal 10 may be a desktop computer or PC. Insome embodiments, the user terminal 10 may be a portable or mobiledevice with a wired or wireless data connection. In some embodiments,the user terminal 10 may be a tablet computing device, a netbook, alaptop or a mobile phone capable of running the web browser 20. The webbrowser 20 may, for example, comprise one of the following web browsers;Google Chrome™, Mozilla Firefox™, Internet Explorer™ or Safari™. Thislist is not intended to be exhaustive.

In use, the web browser 20 may be operated to access web pages. Some webpages may be designed with specific areas or frames of the pageallocated for media to play. In some embodiments, the media may be inthe form of a video. In some embodiments, the media may take the form oraudio or rich media, such as an image. In some embodiments, the mediamay be interactive.

As discussed previously, the design of some web pages may include theuse of framing in order to separate the web page into a number ofdiscrete frames or windows. Other types of web design methodology maynot make use of framing.

In arrangements where the media takes the form of an advertisement, theadvertisement may be distributed through a number of different serversbefore it is delivered to the web browser 20. The distribution of mediaand particularly video media such as advertisements can be considered amarketplace of selling and re-selling of media publications.

In an exemplary arrangement, an advertiser will make an agreement with afirst publisher to publish media in the form of an advertisement a fixednumber of times. From this, the advertiser will distribute the mediafrom an ad server 70 to the first publisher's server 60. The firstpublisher will seek to publish the advertisement to a number of userterminals, in order to fulfil the agreement. If the first publisher isunable fulfil the agreement, the first publisher may arrange the furtherdistribution of the media with a second publisher in order to fulfil theoriginal agreement, and the first publisher may distribute the media tothe second publisher's server 50. This may then continue to a thirdpublisher if the second publisher is unable to fulfil the originalagreement, and so on.

An ad server may be considered a server, such as a web server, thatoperates to store media such as advertisements. Such media may bedelivered to user terminals when a user visits a particular web page orwebsite. In addition, ad servers may also act to target particular mediato particular users depending upon a set of rules. Therefore, aparticular media, such as a particular advertisement, may have beenplaced on a plurality of different ad servers in order to form a chainbetween the original advertiser and the publisher's ad server 50 throughany number of different ad servers before being published to aparticular user terminal 30. Each advertisement 40 a, 40 b receivesmedia from a publisher when a web page is loaded.

Where a website utilises framing in order to locate and place media,such as an advertisement, onto a webpage it is possible to utilise anHTML tag known as an inline-frame or iframe. The iframe is effectively awindow to a separate document. This document may be linked by use of the“src” attribute. It is possible to nest such iframes in order tonavigate through the chain back to the original provider of theadvertisement. In some embodiments, the original provider of theadvertisement may be the advertiser.

It is generally not possible to obtain page information across domainname boundaries when iframes are used. Iframes are presented withoutknowledge of the contents of other frames for privacy or securityreasons. Therefore, the use of iframes suits the network arrangementdescribed in FIG. 1 since it is possible using iframes to supply mediaacross the domain name boundaries of different ad servers. Furthermore,iframes make brokering possible by allowing advertisers to buy or sellmedia on exchanges.

Visibility Model

FIG. 2 illustrates a flow diagram of data acquisition regarding themedia being played in the web page. It is possible to integratemessaging points within the video so that when particular events occurit is possible to perform calculations or transmit data to externaldevices or servers. This is commonly done by embedding functionality inthe form of software code embedded in the media player, such as, withoutlimitation, the use of Action Script within an Adobe™ Flash™ (swf)video.

For a particular media, such as an advertisement video, playing on aparticular player, it is possible to include a number of differentmessaging points throughout the video. FIG. 2 shows an arrangementwherein at each of the messaging points the media player running on theuser terminal transmits at least one data field to a remote server 90.It is possible to define how regularly the data packet should be sent aswell as the contents of the data packet. In some embodiments, data maybe sent when each frame is played. In other embodiments, messagingpoints may occur at alternating frames, i.e. odd or even frames. Othermessaging points and messaging rates may be used.

The data transmitted may include the frame rate information or the modeof the media player (e.g. sleep or active).

The data transmitted from the user terminal 10 may be sent to a remoteserver 90 and forwarded to a model 95 operating on the remote server 90.The remote server 90 may comprise one or more pieces of server hardware.In some embodiments, the server hardware may perform operations on thedata in parallel. In some embodiments, model 95 may be implemented insoftware which executes on the server 90. In some embodiments the model95 may operate across multiple pieces of hardware so as to share theprocessing. In some embodiments, the remote server 90 may be configuredto receive data from one or more terminals simultaneously.

The model 95 operates to receive data from the user terminal 10 which isrunning the media within a web browser in order to determine whether thevideo is visible in the browser. It is advantageous to perform theprocessing remotely from the user terminal 10 since the user deviceproviding the user terminal 10 may have limited processing resources.This is particularly relevant to browsers and also to mobile deviceswhich have reduced processing capabilities.

Other optimisations which may be monitored in order to establish whethera video is playing include a feature of Google™ Chrome™ in which thebeginning of “animation frames” occur more frequently when the elementbeing animated is visible. In addition, some 3D acceleration hardware isavailable to web browsers, for example through Microsoft™ Silverlight™and the WebGL API in HTML5. However, modern graphics hardware generallyonly supports a single instance to be uploading shapers at a time.

These optimisations, although useful indicia of the activity andvisibility of the media to be analysed, present challenges since thevariability in the values may lead to misleading results. Specifically,as discussed above, user devices vary significantly in terms ofperformance and processing power, in addition to type of device.Therefore, the data retrieved from user devices alone may causeerroneous results. In order to allow for fluctuations and variances insystem performance, the model 95 is utilised to perform statisticalanalysis on the data in order to determine with a greater accuracywhether the media is visible.

The output V of the model 95 provides a probability that indicateswhether the media loaded by a web browser is “above-the-fold” or“below-the-fold”.

Flash Sleep Mode

As discussed above, since the user devices have limited processingcapability, many optimisations are used in order to reduce the amount ofdata processing performed. For example, Adobe™ Flash™ uses a sleep modewhen the media, such as the video, is not visible on screen. It istherefore possible to infer from the behaviour of the Flash™ sleep mode,whether the media is visible.

In some embodiments, the video may be in one of a number of differentformats to be played by one of a number of different web video players.As noted above some embodiments of the video may be a .swf file used aspart of the Adobe Flash™ environment. As such the video may haveembedded within it computer code, such as Action Script. The ActionScript may then be executed when the video file is called in order toexecute the process to obtain data by querying the web browser.

In some embodiments, it is possible to ascertain whether the player isoperating in a particular mode in order to infer whether the video isplaying. For example, in some embodiments, the video player may enter asleep mode when the video player is not visible on screen. With otherplayers or browsers, the frame rate of the video player may be reducedwhen the player is not visible on the display of the user terminal. Putanother way, the frame rate of the video player is the rate at whichframes of the video are displayed on the screen. The frame rate of thevideo player may not be the same as the frame rate of the displayscreen. In practice, the frame rate of the video player may besignificantly lower.

Although the discussion of sleep mode below is made in respect of Adobe™Flash™, other combinations of browsers and video types may be consideredand the principle of detecting the visibility of the media, e.g. avideo, on the visible screen remains the same.

Determination of whether Adobe Flash has entered sleep mode involvescalculating the frame rate at which the media is played within thebrowser. Experimental testing was performed across a number of browsers(including Google™ Chrome™, Internet Explorer™, Mozilla™ Firefox™ andSafari™) running on a variety of operating systems (Ubuntu™ Linux™, Mac™OSX™ and Windows™). The results of the experimental testing found for alarge number of combinations of operating system and browser that aframe rate of 2 fps was largely representative of Adobe™ Flash™ enteringsleep mode. The tests also found that a frame rate of 7, 8, 9 or 10 fpsoperated when the Adobe™ Flash™ was not in sleep mode. It is thereforepossible to select a threshold between these values in order todetermine whether sleep mode has been entered or alternatively whetherFlash is in sleep mode.

In order to calculate the frame rate at which the media is played,software monitors the frame rate of the video player embedded in the webpage as it plays the digital media. The software embedded in the videoreceives an event which indicates that a frame has been delivered. Forexample, in Adobe™ Flash™ the “on EnterFrame” event is used. The time atwhich the event is received is stored. It is then possible to comparethe timestamps at which different events are received in order todetermine the frame rate. In some embodiments, the frame rate may becalculated across a number of frames. For example, a rolling average maybe used. In some embodiments, the identification of the change fromsleep mode to an active mode may require the frame rate to be above aparticular threshold for a predetermined period of time.

Detection Model

As discussed above, a model is used to determine the probability of animpression at the user terminal occurring “above-the-fold” or“below-the-fold”. The determination is based upon a number of variableswhich are stored for the model. As described later, the variables arecalibrated and optimised based upon a set of training data.

In some embodiments, the model utilises a linear regression model. Whilethe disclosed example uses Generalised Linear Regression, any suitableregression analysis may be used, for example, without limitation, one ormore of, Ordinary Least Squares, Instrumental Variables, and RidgeRegression. The model utilises the frame rate information and the FlashSleep Mode (FSM) information obtained from the user terminal in order tocalculate a number of variables for use in the model for a particularimpression. Specifically, the model utilises Flash Sleep Mode (FSM) andframe rate information in order to determine the below variables priorto performing the linear regression. In some embodiments, the variablesmay be calculated by the software embedded in the media playing in thebrowser. In some embodiments, the variables may be calculated by anotherapplication running within the server 90, which passes the variables tothe model.

There are many different variables which may be used in the linearregression model. One or more of the variables below may be used:

X₁ FPSLPTIM The amount of time spent in sleep mode (ms). X₂ FPWAKTIM Theamount of time spent in active or awake mode (ms). X₃ FPNUMWAK Thenumber of times the flash element transitioned into active or awakemode. X₄ FPNUMSLP The number of times the flash element transitionedinto sleep mode. X₅ FPSLPPRC % of the total monitored time spent insleep mode

$( \frac{FPSLPTIM}{{FPSLPTIM} + {FPWAKTIM}} )$

The model is in this example a simple logistic regression model, whereinthe binary variable BTF is given a value of 1 for an impressionoccurring “below-the-fold” and 0 occurring “above-the-fold”. Theprobability of the impression occurring “below-the-fold” is given to be:

${P( {{BTF} = {1\text{/}X}} )} = \frac{1}{1 + ^{{- X^{T}}\beta}}$

Where β are the logistic regression coefficients and X is the vector ofvariables defined above,

X ^(T)β=β₁ X ₁+β₂ X ₂+β₃ X ₃+β₄ X ₄+β₅ X ₅

The process of determining the logistic regression coefficients isdescribed in more detail later, but is determined empirically byanalysis of data representative of known visible and non-visible videoplayed across a range of circumstances.

An advantage of utilising the above approach to probabilistic estimationbased on a number of variables over utilising the raw data is that theFPS data may be strongly linked to the processing performance of theuser's device. The variation of system performance between users may beaccounted for by utilising the regression model.

Page Analysis

FIG. 3 illustrates a flow chart of the decision process by which theanalysis is made in order to ascertain whether an advertisement isvisible within a web browser.

Page analysis relies upon knowing the size of the web page relative tothe browser window and the relative location of the media, such as theadvertisement, with respect to the dimensions of the web page. It iscommon for web pages to be larger than the current visible window whichis displayed on the display screen of the user terminal. In use, theuser must scroll down the web page to see the content further down. Putanother way, when a web page loads, there is content visible on thedisplay screen. In some cases, media at the bottom of the page may onlybe partially visible since the remainder of that media may be furtherdown the web page and thus not visible in the current browser window(the presently visible portion of the browser).

In some examples, a web page may be twice the depth of the currentbrowser window. In this case, it is possible to scroll down the web pageto reveal media and content which may not have been visible when the webpage was initially loaded. In order to quantify this more closely, it ispossible to separate the web page into a number of “pages”. Each pagecan be considered a non-overlapping rectangle which has the dimension ofa browser window. It is possible to determine on which “page” of the webpage the media, such as the advertisement, is placed. If the media isplaced on the second page, it is necessary to scroll down the web pageby the depth of a browser window in order that the media is visible.Similarly, if the media is placed on the third page, it is necessary toscroll down the web page by the depth of two browser windows in orderthat the media is visible.

By determining the “page” on which the media is placed within thewebpage, it is possible to determine whether the advertisement is“above-the-fold” or “below-the-fold”. Advertisements are typicallyreferred to as “below-the-fold” if, when the web page loads, theadvertisement is not located on the visible portion of the browserwindow. As such, it is necessary for the user to scroll down the webpage in order for the advertisement to be visible. Conversely, theadvertisement is considered to be “above-the-fold” if, when the web pageis loaded, the advertisement is located on the first “page” of the webpage. Put another way, an “above-the-fold” advertisement is visible inthe browser window when the web page is loaded.

Whether a particular piece of media appears “below-the-fold” or“above-the-fold” is a particularly useful metric in determining whetherthe media was viewed by the user. It is common for users to follow alink or move to a different web page before scrolling down the web page.Users commonly do not view or take note of media which appears furtherdown a web page. Therefore, determining whether media appears above or“below-the-fold” can be a powerful indicator as to whether media hasbeen viewed on a web page by a particular user.

In embodiments where the publication of the media is in the form anadvertisement, the advert may not be paid for if it appears“below-the-fold”.

The page analysis methodology illustrated in FIG. 3 has proved to be avery reliable method for the determining whether a particular media is“below-the-fold” for web pages where the media is not placed within aframe, such as an HTML <iframe>. As illustrated in FIG. 3 the processbegins by starting the analysis at step 100. The second step in theprocess 120 is to determine whether the advertisement appears within aframe, such as an <iframe>, or is placed directly at a position on theweb page. It is not possible to determine the relative location of themedia with respect to the remainder of the web page if the media is inan iframe since the content of the iframe is logically isolated withinthe iframe. Without knowledge of the relative location of the media, itis not possible to determine which “page” (i.e. the position of themedia relative to the total depth) of the web page the advertisementappears on. If this is the case, then page analysis cannot be performedand an error is returned at step 140.

There are many mechanisms for determining whether the web page utilisesiframes in order to determine whether page analysis may be used. Forexample, it is possible to parse the HTML to identify the <iframe> tagor alternatively traverse the parent and window Javascript objects. Itis also possible to determine whether the web page uses frames byanalysing whether WINDEPTH (the window depth) is equal to zero. If(WINDEPTH==0) then it is possible to use page analysis without gettingsignificantly erroneous results. However, if WINDEPTH is not zero(WINDEPTH !=0) then it is not possible to use page analysis.

Using page analysis at step 130, the position of the digital mediawithin the web page is determined. In some embodiments, it is possibleto determine the position of the media to be played by determining thenumber of pixels from the top-left corner of the web page to the digitalmedia. In addition, the browser window width and height is determined.The browser window is the window of the browser which is displayed onthe display screen. This may be determined by the resolution of thedisplay screen and also the proportion of the display screen which isused by the web browser.

At step 130 it is then possible to determine the position of the digitalmedia relative to the current display in the browser window.Specifically, it is possible to determine whether the media is locatedin the first “page” (visible portion) of the web page. Therefore, if themedia, such as the advertisement, is identified to be on the second orlater “page” of the particular web page, then the media is determined tobe “below-the-fold”. However, if the digital media is determined to belocated on the first page of the web page, then it is determined thatthe digital media is “above-the-fold”.

In some embodiments an equal cost model for misclassifications is used.Broadly, it is desirable to avoid false positive results, e.g.minimising the number of “above-the-fold” impressions classified as“below-the-fold”. In some embodiments it is possible to use differentcost functions in order to minimise false positives or negatives.

Hybrid Approach

FIG. 4 is a flow chart in which a hybrid approach for determining thevisibility of digital media is shown. As discussed previously, pageanalysis is a reliable mechanism for determining whether digital mediais visible on a web page. However, as previously discussed, it is notpossible to determine whether a particular digital media is visible ifthe web page utilises framing. Therefore, in known arrangements, nometrics are available in circumstances where page analysis isunavailable.

FIG. 4 illustrates a hybrid arrangement which different analysis methodis used for web pages where it is not appropriate to utilise pageanalysis, for example web pages which utilise framing such as iframes.

Steps 120 and 130 of FIG. 4 are the same as steps 120 and 130 describedin FIG. 3, where the page analysis is used to determine whether media is“above-the-fold” or “below-the-fold”. The difference between step 120 ofFIG. 3 and step 120 of FIG. 6 is that in the hybrid approach of FIG. 4,an error is not automatically raised.

Specifically, if it is determined at step 120 that the web page does useframing, then the hybrid process of FIG. 4 proceeds to step 150. Step150 ascertains the browser type and version number and then checks theinformation to ascertain whether the hybrid method may be applied to theparticular version used by the user terminal.

Since there are many different browser types and versions, it is notalways feasible to reliably test and determine that the method works forall browser types. Further, a large proportion of users are likely touse one of the few, more popular browsers. Therefore, it is moreeffective to develop the method for the more commonly used browsertypes.

Therefore, if the browser type or version is determined to beunsupported by the hybrid method then the process will proceed to step160 in which an error will be returned indicating that the method wasunable to determine whether the media was visible to the user. Such aresult will reduce the census size of the particular metric.

If the browser type or version is determined to be supported by thehybrid method, then the process will proceed to step 170. At step 170,the variables which are input into the model 95 are obtained asdescribed earlier with respect to FIG. 2. The variables which areobtained may comprise one or more of: FPSLPTIM, FPWAKTIM, FPNUMWAK,FPNUMSLP and FPSLPPRC. Having obtained these variables for a particularimpression, the process proceeds to step 180 in which the values of thevariables are checked. In some embodiments, the values of FPNUMWAK orFPNUMSLP are checked to ensure that the values are not negative. If thisis the case, they are replaced with zero.

The process then proceeds to step 190 in which the model is run todetermine P(BTF=1|X). Put another way, the model is run to determine theprobability that the impression occurred “below-the-fold”. The result ofthe model will be a decimal value which is assigned to ‘Y’ within therange and will take a value in the range 1≦Y≦0.

The process then proceeds to step 200 in which the value Y is assessedto determine whether it is greater than 0.5. If Y is greater than 0.5,then the impression is determined to be “below-the-fold”. If the Y isless or equal to 0.5, then the impression is determined to be“above-the-fold”, i.e. visible.

FIG. 5 illustrates a time line of activity when media is requested by aweb page. As discussed previously, in some embodiments it is possible tocross domains by utilising iframes. This enables the media to be calledthrough a chain of ad servers which are each operated by differentpublishers, as described with respect to FIG. 1. The iframes are calleduntil the iframe call reaches a resource on the server 70 which hoststhe media to be played. The point at which the call reaches the server70 is referred to as the “first call” and initiates the process ofloading the media to be provided to the web page. The next step of thetime line is to call the player host. In the exemplary time line of FIG.5, the player host is called to load a swf video file which is forwardedto the web page through the iframe transitions across domains.

In the exemplary arrangement of FIG. 5, Abode Flash sleep mode isutilised in order to determine whether the media is visible. In thisway, when the video player is called, the software code embedded in theplayer begins to execute, thereby generating information relating toFlash sleep mode (FSM), page analysis (PA) and frames per second (fps)for a particular media. This information is then transmitted from theuser terminal to the model 95 for processing.

The next significant point in the time line is the “first frame”, whichindicates that the media has begun playing. In some embodiments, theinformation, such as sleep mode, page analysis and frame rate that isobtained prior to the delivery of the first frame may be used todetermine whether to begin playing the media. In some embodiments, theinformation obtained prior to the delivery of the first frame may bediscarded or not utilised. The information obtained after the “firstframe” point may be used to determine whether the media is being played“above-the-fold” or “below-the-fold”.

As shown in FIG. 5, the sleep mode data is in the form of binaryinformation. In this way, a ‘1’ is indicative of Abode Flash operatingin a sleep mode, whilst a ‘0’ is indicative of Adobe Flash operating inan active mode. Similarly, the page analysis data, when available, isbinary information indicating whether the media is located“below-the-fold” or “above-the-fold”. Where the page analysis data isnot available, an error message is returned. The frame rate data is notbinary data, and may take any number within the capabilities of thevideo player, the display screen of the user device and the web browser.For example, the frame rate may be between 0 fps and 50 fps. Asdiscussed previously, the sleep mode data does not necessarily directlyfollow the frame rate data since the sleep mode data may be determinedby calculating the frame rate across a number of frames or by virtue ofa weighted average. As such, it there may be some lag between frame ratechanges and determination of change in mode between active and sleepmode for Abode Flash.

Calibration

As discussed above, where web pages are configured so that page analysisis not available, for example where the web pages utilises framing, itis possible to determine whether media is visible on a web page by othermethods. A hybrid approach to determining the visibility of media isdescribed earlier, in which a model is used to determine an estimate ofthe probability that a particular media is visible based upon the framerate and mode of the player.

In some embodiments, a linear regression model is used to calculate theprobabilistic estimation. In order to use the linear regression modeleffectively, it is necessary to calibrate the model to a known data setby determining the logistic regression coefficients, β, using trainingdata.

FIG. 6 illustrates a flow chart of the calibration process to determinethe regression coefficients β. The calibration process is performed aspart of the analysis of media loaded into a web page, so that real datacan be utilised. The calibration process begins with by performing step120, in which it is determined whether it is possible to use pageanalysis to determine whether media is visible on a web page, asdescribed earlier.

If it is possible to use page analysis, the process proceeds to step 550and performs page analysis to determine whether the media is“above-the-fold” or “below-the-fold”. The result of the page analysis isreported as described above. The calibration process then proceeds tostep 560, which involves finding the regression coefficients thatpredict the page analysis result.

In some embodiments, the step of finding the regression coefficients βmay be performed after each particular impression so that the regressionmodel is regularly updated. However, this process is processor-intensivesince, in practice, large numbers of impressions occur each day.Therefore, in some embodiments the process of updating the regressioncoefficients β in order to calibrate the model 95 may be “batched” sothat the calibration occurs periodically by preparing a set of trainingdata. For example, the calibration process may occur daily, weekly,bi-weekly or monthly. The calibration process may be performed off-lineor on separate hardware so as to not impact the functionality of theserver 90.

The training data is taken when page analysis is performed and is storedfor calibration at a later date. The training data is obtained when pageanalysis is performed and involves storing the result of the pageanalysis, i.e. whether the media appeared “below-the-fold” or“above-the-fold”. Stored with the result of the page analysis is thedata which forms the vector of variables X which is input into themodel. In some embodiments, the data stored may be in the form of theframe rate and the sleep mode data. In other embodiments, each of thevariables FPSLPTIM, FPWAKTIM, FPNUMWAK, FPNUMSLP and FPSLPPRC may bestored.

As shown in FIG. 6, the coefficients are updated and the process returnsto the “New Impression” step in order to process the next impression.The processing of the next impression, i.e. the next media placed withina web page involves proceeding to step 120 again, in which it isdetermined whether page analysis can be used to determine whether themedia is “below-the-fold” or “above-the-fold”. If is determined thatpage analysis cannot be used to determine the visibility of the media onthe web page, the process proceeds to obtain the data such as FPS andFSM for the particular impression. The FPS and FSM information can beused to determine the vector of variables X. As a result of obtainingthis information, it is possible to run the model, such as the linearregression model, by retrieving the updated coefficients β.

The linear regression is therefore performed at step 600 to obtain V,the output of the model, which is indicative of whether the impressionis “above-the-fold” or “below-the-fold”. After reporting the result ofthe model at step 610, the process returns to the start of the processand the process is started again.

The calibration process is therefore iterative, in that the regressioncoefficients are determined from when page analysis is performed. Thesecoefficients are then used when page analysis is not available. Sincethe results of page analysis are reliable, they form a strong data setagainst which the model can be calibrated and verified. In this way,data which would normally be obtained when page analysis is unavailable,can be obtained when page analysis is available. It is thereforepossible to verify the result of performing the linear regressionagainst the known and accurate result of page analysis.

The assumption that allows this calibration process to be performed isthat the behaviour of the media to be played will be the same, whetherthe web page incorporates framing or not. For example, it is assumedthat media which is presently positioned “below-the-fold” in a web pagewill reduce in frame rate regardless of whether the media is locatedwithin a frame or not. Since the system is able to determine withaccuracy whether media not in a frame is “below-the-fold”, the systemcan use this information to verify to particular degrees of accuracywhether the results accurately reflect that of the page analysis.

For example, in some embodiments, it is possible to determine whetherthe results of the linear regression model for a particular set ofcoefficients are equal to or above 95% across the complete trainingdataset. If this is the case, then coefficients may be deemed to beacceptable for use within the model. In other embodiments, otherthresholds may be determined, such as equal to or greater than 85%, 90%,91%, 92%, 93%, 94%, 96%, 97%, 98% or 99%. Based on this description, askilled person will understand that choice of percentage threshold willbe informed by the level of confidence in the statistically derivedresults.

FIG. 7 illustrates a flow chart of a method for determining theregression coefficients β. As discussed earlier, this process may beperformed in real-time, but more practically is performed off-line on aperiodic basis. In some embodiments, this process may be performed whena new operating system or browser type or version is released in orderto broaden the training set to encompass the new data.

The process of the determining the regression coefficients begins atstep 700 by obtaining, by download or otherwise, the training set. Thetraining data set is formed of the result of the page analysis, i.e.whether the media is located “above-the-fold” or “below-the-fold”. Thetraining data also comprises the frame rate and sleep mode data obtainedfor the same impression.

At step 710, the training data is then input into the linear regressionmodel, so that for each impression which exists within the trainingdata, a corresponding V value is produced as a decimal number between 0and 1. The outputs, V, of the linear regression model are placed into atest pipeline 720 and are placed alongside the corresponding results ofthe page analysis. Each impression is therefore labelled as eitherviewable or non-viewable.

The process then proceeds to the comparison step 730 where, for eachimpression, the result of page analysis and the result of linearregression is compared to see whether they agree. For example, if boththe page analysis and the and linear regression outputs confirm that themedia was not visible on the web page, i.e. “below-the-fold”, then theoutputs agree. This comparison is performed across all of theimpressions in the dataset and a percentage of agreement is determinedfor the training data set. In some embodiments, assessment of thepredictive performance is done using 10 fold cross validation.

The results of the comparison are reviewed at step 900 and if theresults are deemed acceptable, i.e. the results are of a sufficientaccuracy for the particular application, the results of the calibrationare fed back into the model 95 at step 740. Put another way, the newlyupdated coefficients are placed into the model and will be usedthereafter for determining the visibility of media.

If it is determined at step 900 that the results are not sufficient forthe particular application, the process is repeated. Before repeatingthe process, further variables may be included in step 750. In theillustrated examples shown above, the number of variables used totaledfive; namely FPSLPTIM, FPWAKTIM, FPNUMWAK, FPNUMSLP and FPSLPPRC. Insome embodiments, only a subset of these five variables may be deemed ofsufficient accuracy for a particular application. In some embodiments,more than five variables may be required in order to determine whethermedia on a web page is visible to the user. Other variables which may beused in the linear regression model include any suitable variabletracked by the system including, but not limited to, the presence ofmouse or other pointer movement, animation frame rate, clicks, pagedwell-time (how long a user sits on the page), and resource acquisitionsuccess/availability (such as in the case of 3-D hardware).

Those skilled in the art will appreciate that while the foregoing hasdescribed what are considered to be the best mode and, whereappropriate, other modes of performing the invention, the inventionshould not be limited to specific apparatus configurations or methodsteps disclosed in this description of the preferred embodiment. It isunderstood that various modifications may be made therein and that thesubject matter disclosed herein may be implemented in various forms andexamples, and that the teachings may be applied in numerousapplications, only some of which have been described herein. It isintended by the following claims to claim any and all applications,modifications and variations that fall within the true scope of thepresent teachings. Those skilled in the art will recognize that theinvention has a broad range of applications, and that the embodimentsmay take a wide range of modifications without departing from theinventive concept as defined in the appended claims.

We claim:
 1. A method of determining whether media displayed on a webpage in a web browser is visible to a user, the method comprising:determining whether the media is displayed in at least one frame of theweb page; and if the media is displayed in at least one frame,monitoring and analysing activity information from a user terminal todetermine whether the media is visible to the user.
 2. A methodaccording to claim 1, wherein said activity information comprisesinformation relating to the activity of a media player.
 3. A methodaccording to claim 2, wherein the activity information is input into amodel; and wherein the model provides an estimation of whether the mediais visible based upon the information relating to the activity of themedia player.
 4. A method according to claim 3, wherein the model is anumerical model.
 5. A method according to claim 4, wherein the numericalmodel comprises a probabilistic model.
 6. A method according to claim 4,wherein the numerical model comprises a regression analysis.
 7. A methodaccording to claim 4, wherein the coefficients of the numerical modelare determined using training data.
 8. A method according to claim 2,wherein the activity information relates to the activity of the mediaplayer and comprises a frame rate of the media player.
 9. A methodaccording to claim 2, wherein the activity information relates to theactivity of the media player.
 10. A method according to claim 9, whereinthe activity information comprises data on a sleep mode of the mediaplayer.
 11. A method according to claim 10, wherein determining whetherthe media player is in a sleep mode comprises determining whether theframe rate is below a predetermined threshold.
 12. A method according toclaim 10, wherein the activity information comprises total time that themedia player spent in sleep mode.
 13. A method of calibrating a modelfor determining whether media displayed on a web page in a web browseris visible when the web page comprises at least one frame; the methodcomprising using information obtained from at least one media displayedin at least one web page which does not comprise at least one frame. 14.A method according to claim 13, wherein the model is a numerical model.15. A method according to claim 13, wherein the information obtainedfrom the at least one media displayed in at least one web page whichdoes not comprise at least one frame comprises information obtained bypage analysis.
 16. A method according to claim 15, wherein page analysiscomprises: determining the relative location of the media on the webpage; and determining from the relative location of the media whetherthe media is visible.
 17. A method according to claim 15, wherein thepage analysis comprises determining the dimensions of the active windowof the web browser.
 18. A method according to claim 13, wherein themethod comprises monitoring and analysing activity information from auser terminal.
 19. A method according to claim 18, wherein the activityinformation is input into the model; and wherein the model provides anestimation of whether the media is visible based upon the activityinformation.
 20. A method according to claim 18, wherein the numericalmodel comprises a probabilistic model.
 21. A method according to claim14, wherein the numerical model comprises generalised linear regressionanalysis.
 22. A method according to claim 14, wherein the coefficientsof the numerical model are determined using training data.
 23. A methodaccording to claim 22, wherein the coefficients are updatedintermittently.
 24. A method according to claim 18, wherein the activityinformation comprises data on a frame rate of the media player.
 25. Amethod according to claim 18, wherein the activity information comprisesdata on a sleep mode of a media player.
 26. Computer apparatus arrangedto determine whether media displayed in a window of a web browser on aremote user terminal is visible to a user of the terminal, the apparatuscomprising: an interface configured to receive activity data from coderunning on a remote user terminal; computer code operable, whenexecuted, to cause the computer to input said activity data into a modelconfigured to provide an indication of whether the media displayed onthe remote computer is visible to a user at the remote computer; and acalibration module arranged to receive training data to calibrate theactivity data used in the model.
 27. Computer apparatus as in claim 26,wherein the activity data received from the code running on the remoteuser terminal comprises data selected from one or more of: frame ratedata; sleep mode data; mouse or other pointer movement data; click orother selection data; page dwell time; resource availability data;another data capable of indicating media within a at least one frame isvisible to a user.
 28. Computer apparatus as in claim 26, wherein themodel is a numerical model.
 29. Computer apparatus as in claim 28,wherein the numerical model is implemented using regression coefficientsderived from the activity data.
 30. Computer apparatus as in claim 29,wherein the regression coefficients have been calibrated based trainingdata from page analysis of instances without a frame.